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Abstract. We propose an exact technique to calculate lower bounds of spectral 
gaps of discrete time reversible Markov chains on finite state sets. Spectral 
gaps are a common tool for evaluating convergence rates of Markov chains. As 
an illustration, we successfully use this technique to evaluate the "absorption 
time" of the "Backgammon model", a paradigmatic model for glassy dynamics. 
We also discuss the application of this technique to the "Contingency table 
problem", a notoriously difficult problem from probability theory. The interest 
of this technique is that it connects spectral gaps, which are quantities related to 
dynamics, with static quantities, calculated at equilibrium. 
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Contingency tables. 

Markov chains Jl], [|, ||, || have applications in many areas, ranging from pure 
probability theory to theoretical or numerical statistical physics. For example, in out- 
of-equilibrium statistical physics, they are commonly used to write in a formalized form 
the time evolution of physical models || . We study in this paper - as a particular 
application of our technique - the "Backgammon model" || which is a well-known 
paradigmatic mean-field model for glassy dynamics. Markov chains are also wide- 
spread in numerical statistical physics: Monte Carlo algorithms || with Metropolis or 
Glauber dynamics, which are examples of Markov processes, are of great importance 
in the computational study of complex systems. The efficiency of such algorithms 
relies on their rates of convergence towards their equilibrium distributions. The 
ergodic times must be as short as possible to save computational time. Usually, such 
algorithms are said to be rapid if their ergodic times are polynomial in the system sizes 
whereas (generally speaking) the numbers of configurations grow exponentially with 
system sizes. Monte Carlo algorithms are also more and more wide-spread in applied 
mathematics (see [Q ) : the "Contingency table problem" |7|, || that we examine at the 
end of the present paper will provide an example. 

Roughly speaking a Markov M. process needs a characteristic time r to be close 
to equilibrium (the equilibriation time). If P denotes the transition matrix (defined 
below) associated with /A, if l/g(P) is the spectral gap of P, that is to say the 
difference between the two largest eigenvalues of P, then r ~ l/g(P). We explain 
this point and we give references below. Therefore if one calculates a lower bound on 
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g(P), one also gets an upper bound on r. More precisely, if Y/g(P) is polynomial in 
the system size, then r is also polynomial and the chain is rapid. 

Many efficient techniques have been developed to estimate (or to bound) mixing 
times and spectral gaps, when elementary techniques from linear algebra do not apply. 
Among many others, we mention the "Coupling" technique (see Q), the "Canonical 
path" and "Conductance" arguments (see (|, Q), or the "Chain decomposition" 
method (Tc[ |. This last method relies on the following basic idea: (i) decompose the 
state space into disjoint smaller pieces; (ii) prove that the dynamics is rapid on each 
piece considered in isolation; (iii) re-compose the dynamics on the whole state space 
from that on pieces and prove it is rapid in its turn. 

We insist on this method because our "Multi-decomposition" technique is an 
extension of the previous one. More precisely, the idea of the chain decomposition 
method is schematically the following: suppose that the state space f2 of the Markov 
chain can be "naturally" decomposed into several disjoint subsets f2 a , a = 1, . . . ,A; 
then "cut" all the transitions between different subsets f2 a and only keep the transitions 
inside each subsets. The so-obtained restricted Markov chain on each f2 a is denoted 
by M a - Suppose then that one can prove that each Ai a is rapid on il a , and that 
in addition the original dynamics is rapid "between" subsets. Then, under certain 
conditions, the original dynamics on will also be rapid. Unfortunately, even if one 
can easily prove that the dynamics is rapid on each fi Q , it can be extremely difficult 
or even impossible to handle the second step of the proof. In reference JllJ] , an idea 
which bypasses this difficulty was presented in the case of random rhombus tilings. In 
the present paper, we generalize this idea in a formalized form and we show that it 
has a much vaster domain of application than random tilings. 

As compared to the previous "simple" decomposition method, the present 
"multi-decomposition" scheme uses several (m) decompositions of f2, namely 
(fii ;ai ), (fi2;a 2 )i ■ ■ ■ , (fi m;am ). For each decomposition indexed by k — 1, . . . , m, the 
set fi is also decomposed in disjoint subsets flfe ;a . As in the decomposition method, 
one first needs to prove that the dynamics is rapid in each subset Ofe ;a , for each k 
and each a. Then if any two decompositions "overlap sufficiently", in a sense that 
will be precised below, the dynamics will also be rapid on the whole set fl. In other 
words, given a decomposition (flk-, a ), one does not have to prove any more that the 
original dynamics is rapid between subsets of this decomposition. This point is ensured 
instead by the remaining subsets fli-b, I ^ k. These ideas are schematically exemplified 
in figure 

We illustrate this delicate point with a simple example that will also be developed 
in greater detail below: consider a two-dimensional random walker on a square grid 
Pi x P2 (see figure ||). At each step, the walker performs either a vertical or a 
horizontal move, with equal probability 1/4 in each direction (we neglect the question 
of boundaries at this level of discussion). Then in our formalism, we define two 
decompositions (f2i ;0 ) an d (f^b), respectively a vertical and a horizontal one. In the 
vertical decomposition, each subset fii ;o , a = 1, . . . ,pi, is a vertical segment, with no 
possible transitions towards the neighboring vertical segments: the horizontal edges 
are "cut" in this vertical decomposition. Symmetrically, vertical edges are "cut" in the 
horizontal decomposition, and the subsets f^b, b = 1, . . . ,P2 are horizontal segments. 
Now on each subset Sli ;a or f^b, the induced dynamics on vertical or horizontal 
segments is nothing but that of a one-dimensional random walker which stays at 
the same place with probability 1/2 and moves in either direction with probability 
1/4. For such a random walker on segments of lengths pk, the spectral gaps are 
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Figure 1. Schematic representation of a multi-decomposition in the case of 2 
decompositions. There are 20 possible states in Q represented by small circles, 
and the edges between these circles represent possible transitions between states. 
The first decomposition contains 3 subsets Qi ;a (full lines) and the second one, 2 
subsets Q2;b (dashed lines). In the first decomposition, we have emphasized (thick 
lines) the edges that are cut because the transitions between states of different 
subsets f2i ;a are forbidden. If one can prove that the dynamics is rapid in each 
subset of each decomposition then, under certain conditions, one can conclude 
that the dynamics is rapid on the whole state set Q. 



gk — Cst/pf.: the dynamics is rapid on each subset of each decomposition. Then the 
method developed in this paper can be applied to this problem. The underlying idea 
is that any trajectory of the walkers is a succession of vertical and horizontal moves. 
Therefore the dynamics on Q is a combination of the dynamics on the different subsets. 
If the dynamics is rapid on each subset, it will certainly be rapid on fi. More formally, 
one proves via the present technique that 

r< of 

In other words, the dynamics on the whole set f2 is faster than the dynamics on the 
slowest subset. In this case, this inequality can be checked by independent means and 
appears to be an equality. In the present paper, we rigorously formalize these ideas 
in a more general point of view. 



^h;2 
"2:1 



Figure 2. A 2D random walker on a square grid pi X p2- At time t, the walker 
is situated on a vertex. At time t + 1, he chooses with the same probability to 
move on a nearest neighbor, horizontally (full lines) or vertically (dashed lines), 
if he can. There are two natural decompositions of this Markov chain, a vertical 
and a horizontal one. In the horizontal decomposition, for example, the walker 
can only move horizontally on subsets fi2;6- Vertical edges are "cut". 
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The interest of this method is that it can be applied inductively. This idea 
will be used throughout the paper. In the previous example, one can iterate the 
multi-decomposition scheme in the case of random walkers on larger and larger Tri- 
dimensional hyper-cubic grids of side lengths pi x . . . x p m . At each stage m, there are 
m decompositions, one for each dimension of space, and the subsets are (m — 1)- 
dimensional hyper-cubic grids in their turn (see figure ||). Then one proves by 
induction on m that 

<?(P (m) )> inf g(P^)~ Cs j (2) 

" fe=l,..,m yV k SU Pfc (p2)' 

where P^ is the transition matrix of a g-dimensional random walker. Once again, 
this lower bound is in fact exact. 

The organization of this paper is as follows: the first section introduces the 
background definitions and notations used throughout the paper. In particular, the 
notion of discrete time reversible Markov chain on a finite state set is briefly recalled. 
The second section presents the specific definitions and the main results of the paper, 
which are applied to several pedagogical elementary examples in section 3, namely 
random walkers on hyper-cubic lattices and on the symmetric group S n . 

In section |], we successfully apply the present technique to two "Urn models" , 
the "Backgammon model" and the "Monkey urn model" . In both cases, we relate 
the dynamics on m urns to the dynamics on 2 urns. We show that the source of the 
slow dynamics of the Backgammon model is entirely contained in the 2-urn problem. 
In other words, considering many urns instead of 2 does not bring any additional 
slowness Jl^, [l3| . At last, the Contingency table problem is analyzed in section ||. 
The last section 6] is devoted to conclusions. 



1. Generalities 



1.1. Transition matrix P of a Markov chain M. 

The present paper deals with discrete time reversible Markov chains on finite state 
sets. Let be a finite set of cardinality N, M. a Markov process on fl, and P its 
transition matrix of transition probabilities P(x,y): P(x,y) > is the conditional 
probability that the chain is in the state x at time t + 1 given that it was in the state y 
at time t. Note that these transition probabilities themselves do not depend on time. 
Probabilities are conserved so that P( x i v) — 1 f° r all V- P is stochastic. 

We define the state vector e(t) at time t: it has coordinates (e(t, x)) x en where 
e(i, x) is the probability that the chain is in the state x at time t. Then by definition 
of the matrix P, 

e(t+l) =Pe(t). (3) 

A state vector 7r is said to be an equilibrium (or stationary) distribution if 7r = Ptt, 
in other words if 7r is an eigenstate of P with eigenvalue 1. 

The Markov process M. is said to be reversible if it satisfies the detailed balance 
condition jl], |sj 

n{x)P(y,x) = ir(y)P(x,y) (4) 

for all states x and y. This condition ensures that tv is effectively an equilibrium 
distribution for M. It will be useful in the following. Furthermore, we suppose that 
this equilibrium distribution tv exists and is unique (see for instance 0] for further 
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detail on this point). In addition, we assume in the sequel that the "loop" transitions 
P(x,x) are at least 1/2 for all states x. 



1.2. Eigenvalues and spectral gap of P 

We basically assume for the moment that ir(x) > for all states x: the chain can 
reach any state at equilibrium. Then we define the scalar product: 

(/| fl > = £ J-f(x)g(x). (5) 
Because of reversibility, P is self-adjoint for this scalar product: 

( p f\9) = E "TT E P ^ vWvMx) (6) 

= E4rE P ^^)/(y) (seeeq. (|) (7) 

= (/|Pfl)- (8) 

Thus the eigenvalues ft, i = 0, . . . , N— 1, of P are real. Moreover, |ft| < 1 for all i by 
standard Perron- Frobenius theory Since the equilibrium distribution exists and 
is unique, ft) = 1 and 

1 > ft > P 2 > ■ ■ ■ > Pn-i > -1. (9) 

In addition, we have supposed that P(x,x) > 1/2 for all a;. As a consequence, 
P = 2P — 1 is also a transition matrix with non-negative transition probabilities, 
the eigenvalues 2ft — 1 of which are larger than —1 according to (0). Hence ft > for 
all i. Now we can define the spectral gap of P: 

5 (P) = 1 - ft > 0. (10) 



Spectral gap and equilibriation time r 

In the introduction, we claimed that the characteristic time r to be close to equilibrium 
is of order l/g(P), whatever the initial probability distribution e(0). Let us precise 
this point and in particular what we mean by "close" . If the chain was in the state 
x e ft at time 0, we denote by e x {t) the state vector at time t. 

In the literature, the distance between the probability distribution e x {t) and the 
equilibrium one tt can be measured by various means. Among them, two norms are 
commonly used, the Euclidean norm associated with the above scalar product and the 
1-norm or variation distance || : 

^(*) = 5Ei e »(*'f)- 7r (f)i- ( n ) 

yen 

Note that A x (t) = sup Agn \p(A,t)—n(A)\ where p(A,t) (resp. n(A)) is the probability 
that the system is in the subset A of f2 at time t (resp. at equilibrium). Then given 
e > 0, we define the mixing time T var (e) associated with this distance: 

T V ar(s) = maxmin{£ /V£ > t Q ,A x (t) < e} . (12) 

X to 
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In other words, whatever the initial state x, one is sure to stay within distance e of 
equilibrium after T var (e) steps. Then elementary algebra (see e.g. |lt|) shows that: 

^^(inQj+maxln^)) (13) 

This upper bound is of order X/g(P) even though the second term of the prefactor of 
l/g(P) can be substantially big. However, this second term is usually polynomial in 
the system size, at least at high temperature, where the distribution tt is uniform: this 
second term is simply the entropy of the system and if it is extensive, it is polynomial 
in the system size. If l/g(P) is also polynomial, the dynamics remains rapid. The 
reader can refer to Wilson |15| for an interesting discussion on this point. 

On the contrary, at vanishing temperature, only the low energy levels are occupied 
and some probabilities ir(x) tend to zero and this second term diverges, even for finite 
systems. This is not physically relevant, since r var remains finite in general. The 
previous upper bound ([l3|) is not adapted. 

Should we have chosen the Euclidean norm to measure the distance under interest, 
we would have been led to the similar conclusion that the equilibriation time is of order 
l/g(P) (see also |H| for further discussion). In the sequel of this paper, we shall focus 
on spectral gaps rather than such equilibriation times. 



2. Main result 

2.1. Definitions 

Generally speaking, given a discrete time Markov chain M on a finite set fl, a 
decomposition |l(| of M. is both a partition of fl in disjoint subsets a , a = 1, . . . , A, 
and a new natural dynamical rule on f2. This dynamical rule is defined by its transition 
matrix P' as follows: for all x ^ y € 0, if x and y belong to the same subset f2 then 
P'(x,y) — P(x,y), else P'(x,y) = 0. We say that we "cut transitions between the 
different subsets f2 a ". The diagonal terms P'(x,x) are suitably modified to ensure 
that P' remains stochastic. If M a denotes the restriction of P' to f2 a , then P' is 
block-diagonal and it is the direct sum of the M a : 

P' = M 1 ®M 2 ® ...®M A . (14) 

We naturally suppose that the decomposition is designed so that each subset f2 a 
is connected with respect to the new dynamical rule and that for each fl a seen in 
isolation, there also exists a unique stationary distribution. Then the equilibrium 
state of P 1 is degenerate of dimension A. We call P' the restriction of P to the 
decomposition (f2 a ). 

As it was announced in introduction, we shall consider the case where we have m 
different decompositions (Qi- ai ), (fl2-a 2 ), ■ ■ ■ , (&m;a m ), a>k = 1, • • • , A^. A priori, the 
different Ak need not to be equal. Then we define as above m different restrictions for 
each decomposition (Ofe ;o ), denoted by Pk- We say that we have a multi-decomposition 
or a m-decomposition. 

Now we need to introduce a property of regularity. We say that a transition of 
the original chain P(x,y) > "belongs" to the decomposition (f2jt ;a ) if Pk(x,y) > 0, 
in other words if this transition is not cut in that decomposition. 

Definition A m-decomposition (Oi ;ai ), (r^2;a 2 )j • • • > (^m;a m ) of M is said to be regular 
if there exists an integer r < m such that each transition P(x, y) > belongs to 
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exactly r decompositions. The integer r is called the redundancy of the regular multi- 
decomposition. 

If the m-decomposition is regular of redundancy r then we have the following 
simple but important relation: 

P1 + P2 + ■■■ + P m r-m , . 

P = 1 2 + 1 15 

r r 

which interconnects the different decompositions jfl). The second term of the sum 
restores the diagonal coefficients P(x, x). 

Note that the restriction of M to each subset f2fc ;a of each decomposition remains 
reversible, because if P(x,y) is cut, then P(x,y) is also cut: 

Tr(x)P k {y,x) = ir(y)P k (x,y) (16) 

In addition, n remains a stationary distribution for each Pk, but it is not unique; We 
denote by Ek the eigenspace of eigenvectors of Pk corresponding to the eigenvalue 1. 
Its dimension is Ak because we also suppose that on each subset ^lk-a there is a unique 



stationary distribution. Equation (16) also leads us to the conclusion that Pk remains 
self-adjoint for the scalar product ^). If we denote by life the orthogonal projection 
(with respect to this scalar product) onto Ek, then any eigenvector (other than vectors 
of Ek) projects onto 0. We define the multi-projection: 

n = n 1 + n 2 + ... + n m . (17) 

Then II(7r) = mir. We say that the multi-decomposition is non- degenerate if this 
eigenvector is non-degenerate himself, which is satisfied in practice. We denote by v 
the second largest eigenvalue of II (in modulus) and we call it the norm of the multi- 
projection II. In the following, the difficult step will often be to calculate this norm. 
We shall discuss this point in subsection [2.3| and give several examples in sections [| 
I and I 

Definition A non-degenerate m-decomposition, which is regular of redundancy r 
and which has an associated multi-projection of norm v , is called a non- degenerate 
(m, r, v) -multi-decomposition. 

2.2. Main theorem 

The eigenspace Ek of Pk corresponding to the eigenvalue 1 is degenerate: we denote 
by g(Pk) the difference between 1 and the second largest eigenvalue of Pk (strictly 
smaller than 1). 

Theorem 1 Let P be the transition matrix of a Markov chain M. on Q and 
(fil ;ai ),(Q2 ;02 ),...,(fl m; i, m ) be a non- degenerate (m,r,v) -multi- decomposition of A4. 
If Pk denotes the restriction of P to the decomposition (Qk-.a k ), then the following gap 
relation holds: 

g(P)>^—^Mg(P k ). (18) 
r k 

Proof: For the sake of convenience, we work in this proof with the matrix: 

p= P i + P 2 + ... + P m= r Lp+ rn-r t 
m mm 

At the end of the calculation, the gap of P will simply be: g{P) = m/r g(P). 
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Given a state vector e, we note e^, q = Ilfc(e) (the superscript "eq" stands for 
"equilibrium"); e^ q depends on k and e, since the eigenspace of Pk corresponding to 
the eigenvalue 1 is degenerate. All the difficulty in the following lies in this degeneracy. 

We suppose now that we have sorted altogether all the eigenvalues of the m 
matrices Pk'- 1 > fJ,i > fJ,2 > • • • > fJ-q > 0. We denote by fj the normalized eigenstate 
corresponding to the eigenvalue fij] fj can a priori be the eigenstate of any matrix 
Pk ■ The vector e — 7r is orthogonal to 7r and we write 

m 



e -- = ^E( e - e D + ^EK q --) (20) 

k=l k=l 
1 q 1 

3=1 k=l 

where each vector (e — e^ q ) has been projected on the eigenbasis of Pk, and 

p( e -*) = -Y, p ^ e -*) ( 21 ) 

k=l 

1 m 1 m 

fc=i fe=i 

= -Ew, + -EK q --), 

i=i fc=i 
by definition of and /z^ . Now, thanks to a suitable Abel transform, 

P(e-7v) = -(l-^ 1 )Y / (eT-7v) (22) 

fe=i 

9-1 / m 

+ — E(^ - ^+0 EK q - 

r=l y/c=l 
^ I in q 

+ - E( e fe q - ^ + E q! ^j 

Thus, if || • || stands for the Euclidean norm associated with the scalar product (|^), 

||P( c _ ff )||< (1_ M1 )|| (23 ) 

fc=i 

9-1 
r=l 

1 m 

<-(l-Mi)|lE( e r-^)ll+Mi||e-7r||. 




fc=i 

m r 

Indeed, /x/. — Mfe+i ^ an d M9 ^ 0j moreover, for any r < q, /J(e^ q — 7r) + 

fc=i i=i 

is the sum of m orthogonal projections of e — 7r on suitable spaces, the norm of each 
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of them being therefore smaller than ||e — 7r||. In addition, X)fc( e fe q — 7r ) = n(e — ir) 
and by definition of v, 

m 

|]^(e e , q -7r)||<H|e-7r]]. (24) 
fc=i 

As a consequence, 

||P(e-7r)||<[^(l- Ml ) + M i] ||e-7r||, (25) 
to 

from which it follows that 

g{P)>— -(l-m)>^—^Mg(P k ). (26) 

TO TO fc 



Thus 



777 — u 

g{P)> inf.g(ft). (27) 

r k 



2. 3. How to calculate the norm v of the multi-projection 

As mentioned above, the difficult step in the present multi-decomposition technique 
is generally to calculate the norm v. In sections | and § we list several examples 
where this calculation is feasible exactly. The idea is to construct a matrix which 
has the same eigenvalues as II, which is much smaller than II and the elements of 
which depend only on the equilibrium distribution tt and the subsets flk-a- This is an 
important point: v is calculated at equilibrium and the spectral gap of P, that is its 
dynamical rate of convergence, is eventually monitored by this equilibrium quantity. 

However, even in the cases where this calculation is out of reach, the present 
technique is also a great advance as compared to direct diagonalization of the matrix 
P to obtain g(P), because in order to calculate v numerically, one has to diagonalize 
a much smaller matrix than P. This point of view is used at the end of section |^ to 
support a conjecture about the "Continge ncy table problem" . 



As it was already remarked in section 2d, given a decomposition, (fl k ;a)a=i,...,A 



since the different subsets have been "disconnected" , the vector space Ek of eigenstates 
corresponding to the eigenvalue 1 is degenerate of dimension Ak . In order to write the 
projections e^ q — 7r = II/. (e — 7r), which belong to Ek, we need a suitable basis of Ek- 
For each k and each a, we define the vector g k . a , the coordinates of which are equal 
to gk;a{x) = 7r(x) for any state x G ^lk-a and to anywhere else. Note that 

\\9kJ = [ £ . (28) 



We set 



Uk;a - (29) 



Then (uk-, a )a=i,...,A k is an orthonormal basis of Ek and by definition of Hk (section 2.1), 



Tl k (f) = J2 (/!«*;»>«*!« (30) 
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for any vector /. Hence 

m Ak 

n(/) = Y, MS) = E E </!«*:«> u ^ ( 31 ) 

k fc=l a=l 

Let us denote by U the matrix which has the different vectors Uk- a for all k and all 
a as column vectors; this matrix has N lines and J^k ^ k columns (the total number 
of subsets fifc ;a )- We also denote by V the diagonal matrix of diagonal elements the 
7r(x), x G Q. Then 

n(/) = UU*T>- 1 f (32) 

where J7* is the transpose of U with the vectors Uk- a as line vectors. We are interested 
in the eigenvalues of II and therefore of UU t P~ 1 . Now given any two matrices A and 
B, a common theorem in basic linear algebra |Q says that AB and BA have the 
same non-zero eigenvalues, even if A and B are not square. As a consequence, II has 
the same non-zero eigenvalues as IT = U t V~ 1 U, the coefficients of which are 

fr / i \ Sxeo fc;Q nn i:i , w \ x ) , . 

H(*;o),(«;6) = K»l«l,l) = 1/2 TT7l( 33 ) 

This matrix is positive semi-definite; its eigenvalue are non-negative. 

Theorem 2 Let (f2i ;oi ), (02 ;O2 ),..., (O m;am ) be a non- degenerate (tn,r,v) -multi- 
decomposition of a Markov chain A4, of stationary distribution tt(x), the norm of 
the associated multi-projection is the second largest eigenvalue of the matrix IT defined 
by eg. 

The dimension of the square matrix IT is Ak- This quantity is in general much 
smaller than the dimension N of P. If the distribution 7r is uniform (for example at 
infinite temperature), then 

where 15*1 stands for the cardinality of any set S. One may also choose to work in 
a different basis where the above matrix may have a more practical expression: if Q 
is the diagonal matrix with diagonal coefficients Ifl^a] 1 ^ 2 , then IT = Q1TQ -1 has the 
same eigenvalues as IT and IT. Its coefficients are: 

n <*«MW = |n i;6 | • (35) 

The same kind of transformation can also be used in the case where 7r is not uniform: 

- 12xen k;a nn hb 7r ( a; ) . . 

n (k;aUm = Exeaib<x) (36) 

This is the conditional probability, at equilibrium, that the system is in the subset 
Qk;a given that it is in the subset Qi-b- 



3. Elementary examples 

In this section, we display several examples where the multi-decomposition technique 
is particularly efficient. In these three examples, the lower bound provided by the 
method can be compared to exact values of the gap calculated by independent means. 
In these three cases, the lower bound of theorem p] turns out to be an equality. 
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3.1. Random walk on a m- dimensional cube 

We consider a random walker on a m-dimensional hyper-cube. We first derive exactly 
the spectral gap of this Markov chain. Then we prove that the lower bound provided 
by the multi-decomposition method is in fact an equality. 

Our random walker moves on the vertices of a cube, f2 m = {0, l} m . He moves 
through the edges of the cube. His position is denoted by (x%, X2, ■ ■ ■ , x m ). At each 
step, he chooses randomly an index i among m as well as a direction b = ±1. If the 
move Xi — > Xi + b is possible then he performs this move, else he stays at the same 
place. The allowed transition rates are all equal to l/2m. Then the transition matrix 
p( m ) f this chain can be written as a tensor product of m one-dimensional walkers 
on {0,1}: 

p(m) = _ yvj ® . . . ,g, i g) t fc ,g) i ,g) . . . (g IV (37) 

m z — ' 

fe=i 

because the walker can choose one of the m directions with equal probability 1 /m and 
then it performs a one-dimensional walk in that direction. The transition matrix of 
each one-dimensional walker in the direction k is 

* fc= *=i( 1 !)■ (38) 

The eigenvalues of t are 1 and 0, and g(P( m ^) — 1/m. 

We now use the multi-decomposition technique in order to compare the obtained 
lower bound to this exact value: We relate the spectral gap of p( m ) to that of a random 
walker on a (m— l)-dimensional hyper-cube f2 ln _i. There are m natural ways to build 
the decompositions, by preventing moves in one (and only one) of the m directions of 
space. Then there are 2 subsets in each decomposition, and they are isomorphic to a 
hyper-cube £l m -i- 

Furthermore, each transition belongs to exactly m — 1 decompositions among m, 
more precisely to every decomposition save the one where this transition is forbidden. 
Therefore this multi-decomposition is regular of redundancy m — 1. 

Now we consider the multi-projection associated with this multi-decomposition. 
Let us prove with the help of relation ([m]) (and theorem ^) that its norm v = 1. 
In this case, |Ofc ;o | = 2 m ~ 1 , the cardinality of a (m — l)-dimcnsional hyper-cube, for 
all k = 1, . . . ,m and all a = 1,2; and |Afe ;a n f2; ; &| = 2 m ~ 2 whenever k ^ I. Hence 
^-(k;a),(l;b) — 1/2 if k ^ I and ft is a block-matrix of 2 x 2 blocks. Diagonal blocks 
are equal to the identity and non-diagonal ones to t. It is an elementary exercise to 
compute the eigenvalues of such a matrix: m is the largest eigenvalue, 1 is m times 
degenerate, and the m — 1 remaining eigenvalues are equal to 0. We conclude that 
v = 1. According to theorem [l], g{P( m ^) > ird k=i,...,m 9 (Pk)- All matrices have 
the same gap. 

Now it is time to draw the reader's attention to the following important point: the 
(allowed) non-diagonal transition rates in matrices p( m > and Pk are equal. Therefore 
Pk is the direct sum (see eq. (H)) of 2 transition matrices of a (m — l)-dimensional 
random walker, but with non-diagonal transition rates l/2m instead of l/2(m — 1) in 
p(m-i) ^ g a CO nsequence, 

g( Pk ) = ^lg{P^)) (39) 
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and 

3(P M) > «,(P(— D) >...>! 5 (P«) = 1. (40) 

to to m 

Here, the lower bound calculated via the multi-decomposition technique is the exact 

spectral gap. 

3.2. Random walk on a m- dimensional box 

We extend this calculation to the case considered in introduction, where the Tri- 
dimensional box Q m is not necessarily a cube any longer. Its side lengths are denoted 
by pi, p2 ■ ■ ■ , Pm ■ We follow the same line as in the previous example except that 
now pi- > 2. However, before going on, we need to clarify the question of boundary 
conditions, which was temporarily postponed in the introduction. In fact, we shall 
demonstrate the following striking property: the details of the proof do not depend 
precisely on boundary conditions provided the subsets ^lk-a are the same, as well as 
the stationary distribution tv. This is the first illustration of a general feature of the 



multi-decomposition technique that was already mentioned in section 2.3, and that 
will be largely discussed in the following sections: v is an equilibrium quantity. 

We consider both "von Neumann" and periodic boundary conditions. In the first 
case, and as in our previous example, if the walker tries to move outside the grid, he 
stays at the same place. In the second case, the grid lies on a torus and the walker 
has always 2m possible moves; all the vertices play the same role. In both cases, the 
allowed transition rates are equal to 1/4to. As compared to the previous examples, 
where they were set to l/2m, these transition rates ensure that P{x, x) > 1/2 (In 
the previous example, this property was due the the fact that all vertices lay on the 
boundary). 

The exact calculation of the gap of the transition matrix p( m ) is similar to the 
previous one: can also be written as a tensor product of to one-dimensional 

walkers (eq. (|37|)), since the walker chooses one of the m directions with equal 
probability and performs a one-dimensional walk in that direction. However, the 
transition matrix tk of the one-dimensional walker now depends both on the side 
length pk and the boundary conditions. For von Neumann ones, 

/ 3 1 \ 

12 1 ••• 
12 10 

'■• '■. '■■ 

1 2 1 

V 13/ 

is a pk x Pk matrix. Its eigenvalues are Xj — cos(jn/pk)/2 + 1/2, where j = 
0, 1, . . . ,Pk~ 1. The eigenstates of p( m ) are the tensor products of the one-dimensional 
ones and its eigenvalues are all the combinations 

\ - bil + -\?2 + • • ■ + ^jm f A r,\ 

Ajl,h,-,3 m ~ — • \ iZ ) 

If sup fc pfc = Pk , the second largest eigenvalue is obtained when all the jVs are set to 
except for the fco-th one which is set to 1. Thus g{P^ m ^) = (1 — cos(Tr/pk ))/2m. In 



1 

t k = - 



(41) 
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the case of periodic boundary conditions, a similar calculation leads to S^p^ic-dic) 
(1 — cos(27r/pfc ))/2m, with 



tfc, periodic . 



instead of tk- 



( 2 
1 





V i 



1 \ 






1 

2 / 



(43) 




Figure 3. A decomposition (among 3 possible ones) in the case of a random 
walker on a three-dimensional cubic grid of sides pi X p2 X p$. The vertical 
moves, which are allowed for the three-dimensional walker, are now forbidden in 
this decomposition. The walker is constrained to move on one of the P3 two- 
dimensional grids of sides p\ X p2, denoted by f23 ;a . 

We now decompose the m-dimensional walker in (m — l)-dimensional ones 
(figure |^). The argument is exactly the same as in the previous section 3.1, except 



that g(Pk) now depends on k since pk does. For each k, the decomposition (Qk,a) 
is obtained by cutting the edges in the direction k of the grid. For both boundary 
conditions, this results in constraining the walker to move on a (m — l)-dimensional 
grid, with the same boundary conditions as in the original grid Q m . In addition, the 
matrix n (or II) is slightly more complex: we calculate 

\^k;a\ = Y[Pj' \^k-a nfi; ;fc | = Y[Pj and n (M a),(i;b) = — . (44) 

whenever k ^ I, whatever a and b. The eigenvalues of II can also be exactly calculated 
and one also gets v = 1. The redundancy is r — m — 1 as it is in the previous example. 
Theorem [l] can be applied and one gets the gap relation g(P'- m - ) ) > inf^ g(Pk)- We 
recall that Pk is the direct sum (see eq. (pT[)) of pk transition matrices of a (m — 1)- 
dimcnsional random walker, with transition rates l/4m instead of l/4(m— 1). Taking 
this fact into account, one gets by induction 

5(P (m))> 1 inf (1 _ CO s(7r/p fe )) (45) 
Ira k=i,...,m 

for von Neumann boundary conditions. The exact gap was calculated above and this 
lower bound is in fact exact. 
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The conclusion is the same for periodic boundary conditions: since the subsets 
flk-a are the same as in the von Neumann case, the matrix II is identical for both 
boundary conditions. Therefore v and r are also equal in both cases, and the gap 
relations are identical. Finally, the two proof are exactly similar except that the first 
steps of the proof by induction, that is to say the spectral gaps of the one-dimensional 
random walkers, differ. One finally also gets the exact gap calculated above for periodic 
boundary conditions. 



3.3. Random walk on the symmetric group with random transpositions 

This Markov chain belongs to the "Card shuffling models" ||. In this example, we 
re-derive the spectral gap which is usually calculated by group representation theory 
arguments Jl7j. In this case, the multi-decomposition technique is remarkably simple 
and efficient and provides again an exact lower bound. 

Given an integer m > 2 and the symmetric group S m , the "Random transposition 
Markov chain" ||, jlTj is defined on £1 = S m as follows: given a state (a permutation) 
a = (<7i, (T2, . . • , o~ m ), pick up uniformly two different positions k and I at random, 
then transpose a k and oi with probability 1/2 (so that P(x,x)> 1/2). This Markov 
chain converges towards the uniform distribution on S m (see [TtJ ) . The transition 
matrix p( m ) is defined as follows: if two different permutations x and y differ by a 
single transposition then P(x,y) = l/[m(m — 1)]. In reference ]l7[ , it is proven that 

g(P (m) ) = -^-r (46) 

TO — 1 

with arguments based on the theory of representations of S m (the result is slightly 
different in reference [[l7] because transition rates are slightly different). 

Now we decompose the random walk on S m in random walks on <S m -i in order to 
compute inductively the lower bound on the gap. The decomposition is natural: for a 
given k = 1, . . . , to, we fix the position of a k and we prevent transpositions involving 
this index k. The subsets VL k - a are defined by the fixed value a k = a. The resulting 
chain on each fl k - a is the same as the original one on a set of cardinality m — 1, with 
transition rates 1/[to(to — 1)] instead of l/[(m — l)(m — 2)]. Now we calculate ft: 

|«*:,| = (m-1)! ; |O fc;a n^ ;6 | = (m-2)! ; ti (k . a) n. b) = -J—, (47) 



6, \tl k . a n n 



H;b\ 



when k ^ / and a ^ b. If k ^ I but a 
equal, and U( k -M),(kb) = 0. 

Hence II is a block-matrix with m lines and m columns of blocks 



v ;>v j TO — 1 

since a k and o~i cannot be 



/ 1 



n = 



D 



D 



B 



B 



B 



B 



B \ 



B 



B 



(48) 



1 / 



where each block is itself a m x m matrix and 
/ ^ ••• • 
o ^ • 



B = 



l 

Tn—X 



m-1 \ 
1 

m— 1 



1 

m— 1 

\ ^_ 

\ m-1 



m — 1 f 



m—1 

o 



(49) 
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The eigenvalues of B are 1 (non-degenerate) and — l/(m — 1), to — 1 times degenerate, 
with respective eigenvectors X a = (1, . . . , 1) and Xj = (0, . . . , 0, 1, — 1, 0, . . . , 0), 
i = 1, . . . ,m — 1. Then one computes the to 2 eigenvectors of H: the (m — l) 2 vectors 
(0, . . . , 0, Xi, —Xi, 0, . . . , 0), where i > 0, with the same eigenvalue m/ (m — 1); the 
(m — 1) vectors (0, . . . , 0, Xo, — Xq, 0, . . . , 0), with eigenvalue 0; the (m — 1) vectors 
(Xi, Xi, . . . , Xi) with eigenvalue 0; and the vector (Xq,Xq, . . . , Xq) with eigenvalue 
to. Finally, the multi-decomposition is non-degenerate and 
to 

v= -. 50 

TO — 1 

This value tends to 1 when to is large but is not equal to 1. This nuance is crucial in 
order to get the good behavior of the gap with to. 

On the other hand, this multi-decomposition is regular with redundancy r = 
to — 2. Indeed, each transposition of indices k and I is forbidden in exactly two 
decompositions, namely (Ofc ;a ) and (fli-b)- Then theorem]]] provides the gap relation: 

g{P [m) ) > ^— m{g(P k ). (51) 

TO — 1 k 

Taking into account the difference of transition rates in Pf. and p( m_1 ), we have 
g(P k ) = (to - 2)/to g(p( m -^) and 

g{P {m) ) > ^ g(P^) > ■ ■■> — ^—r g(P (2) ) = (52) 

TO— 1 TO— 1 TO— 1 

because g(P^) — 1. Once again, this bound is the exact gap. 
4. Urn problems 

In this section, we show that the multi-decomposition technique can be applied to a 



sub-class of the larger class of "Urn models" 18[ [19|. The latter have been designed 
as toy models for glassy dynamics, to which belongs the maybe more notorious 
"Backgammon model" [^|. For the first time, we explicitly introduce temperature 
in our method. However, for the sake of simplicity, we shall essentially focus on 
dynamics at vanishing temperature T — > 0. 

In urn models, to identical urns contain n identical (distinguishable or not) 
particles, or balls. In the historical Ehrenfest urn model, to = 2 and n is large. Then 
balls can be exchanged from one urn to another, which will define the elementary 
moves. The different rules of exchange characterize the different urn models. In 
general, they are defined so that the system prefers to have the maximum of empty 
urns, and eventually only an urn containing all the balls. Usually, these urn models 
are endowed with a Hamiltonian: the energy £ of a configuration is the number of 
non-empty boxes. 

We shall focus on two different models governed by a Metropolis algorithm at 
T — > based upon the previous Hamiltonian. These two models are identical, except 
that balls are indistinguishable in the first model and distinguishable in the second 
one. We shall see that this difference has dramatic consequences on their dynamics. 
The first model is usually known as the "Monkey model" and the second one as the 
"Backgammon model" . Their elementary moves are defined as follows: 

(i) "Monkey model" [fl9|| : balls are indistinguishable; two distinct urns are chosen 
uniformly at random among the to possible ones, a "departure" urn D, and an 
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"arrival" urn A. If a ball can be removed from D to A without increasing the 
energy E, that is to say if D and A are not empty, this move is performed^; 

(ii) "Backgammon model" |jj : balls are distinguishable; among all the balls contained 
in the m urns, one ball b is chosen at random and its urn is denoted by D. A 
second urn is chosen uniformly at random among the m — 1 remaining ones. 
denoted by A. The ball b is taken from D to A if this move does not increase the 
energy (that is if A is not empty). 

By construction, in both models, the number of empty urns cannot but increase. 
The dynamics is frozen when all the urns are empty save one. We can already foresee 
that model (ii) will be slower to reach this frozen state than model (i) because it is 
more difficult to empty an urn entirely: when one urn is nearly empty, the ball b 
belongs to the other urns with much higher probability. 

Note that at T > 0, this Metropolis algorithm must be modified as follows ||: an 
elementary move is accepted with probability min(l, exp(— AE)/T), where AE is the 
energy variation is the move was accepted. The canonical distributions corresponding 
to the above Hamiltonian are: 

irm(m, • ■ • , n m ) = exp(-E/T) (53) 

Z (i) 

for model (i), and 

1 n 1 

7r(ii)(rii, . . . ,n m ) = 1 — : exp(-E/T) (54) 

for model (ii), where represents the number of balls in the urn number i. These 
distributions satisfy the detailed balance (§) for the Metropolis Markov chain. 

First we treat the backgammon model which displays slow dynamics. The monkey 
model will be studied at the end of the section. We will prove below that the spectral 
gap of the backgammon model at T — > decays exponentially with n. By contrast, the 
monkey model is rapid: l/g(P) is polynomial in m and n. As a consequence, all the 
glassy character of the backgammon model comes from the distinguishable character 
of its particles. 



Jf..l. Backgammon model: case m = 2 

To begin with, we study the case of m = 2 urns, denoted by U\ and Ui. At this 
stage, let us already discuss the important following point: at zero temperature, the 
fundamental state is twice degenerate: all the balls can be in U\ (state x\) or in U2 
(state 2:2). In other words, the eigenvalue 1 of P is degenerate and at vanishingly 
small temperature, the spectral gap tends to and the equilibriation time diverges. 

But what does this equilibriation time measure? At very low temperature, the 
energy landscape is essentially a bistable potential with an energy barrier of height 
1. Thus l/g(P) measures the time that the system needs to equilibriate between the 
two fundamental states x\ and X2, in other words to explore the two potential minima 
with almost equal probabilities. Naturally, this time diverges like exp(l/T). But we 
are not interested in such a time but rather in the typical time to reach one of the two 
minima, the "absorption time" in Ref. fll2[ . In other words, at zero temperature, we 

f In the monkey model of Ref. |l9[ , the arrival box is non-empty; we get rid of this constraint for 
the sake of convenience with respect to our method; this modification does not alter the overall 
conclusions. 
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consider that the system has reached equilibrium when one urn is empty (and when 
all urns are empty save one if there are several urns). For a finite system, this time 
remains finite, even at T = jl2[ 

Therefore we would like to design a new transition matrix P', the spectral gap of 
which measures efficiently the previous time at T = 0. A solution consists in mixing 
up both stationary states as follows: when the system has reached one of the states 
x\ or X2 for the first time at to, at each time t > to, he can be flipped to state x\ or X2 
with probability 1/2. In other words, we set P'(x\,X2) = P'(x2,xi) — P'(xi,Xi) = 
P'{x2, X2) = 1/2. As a consequence, at time to + 1, the system can have explored the 
two potential minima with equal probabilities. The unique equilibrium distribution is 
now 7r = (x\ +X2)/2. And the spectral gap g(P') now measures the equilibriation time 
we are interested in. We denote by P^ the previous matrix P' . It is a (n+1) X (n+ 1) 
matrix, with elements proportional to the number of balls in the urn D: 



p(2 















4 







I 

n 

... 



(55) 



We have not succeeded in diagonalizing this matrix. It is not specifically our purpose 
since we principally desire to exemplify how the multi-decomposition method relates 
m-urn problems to 2-urn ones. However high precision numerical diagonalization up 
to n = 200 shows without ambiguity that g(P^) ~ l/2 n ~ 1 when n is large. This 
value seems to be asymptotically exact. We shall see in the following subsection that 
all the slow character of the backgammon model lies in this 2-urn exponentially small 
gap: adding more urns will only modify it by a polynomial factor. 



4-2. Backgammon model: case m > 2 

Now we consider the case m > 2 and we relate its dynamics to the previous one 
m = 2. We denote the urns by Ui, . . . , U m . We apply the same kind of trick as in 
the two-urn case to get rid of the degeneracy of the fundamental state: the m lower 
energy states, were all urns are empty save one, are mixed up and form a single state. 
Once all urns are empty save one, all the balls can be transferred to any urn Uk with 
equal probability 1/m. This trick ensures that at T = the equilibrium state in non- 
degenerate and that the spectral gap of the new transition matrix measures the good 
equilibriation time, that is to say the typical time needed to empty all urns save one. 

The multi-decomposition is defined as follows: fife )0 , k = 1,. ..,m, is the set of 
all configurations where the urn Uk contains exactly nk = a balls. In this subset, the 
number of balls of Uk cannot vary. The integer a can take all the values ranging from 
to n. Now we prove that the multi-decomposition is not degenerate and that v = 2, 
whatever m > 2 and n > 2. We work with the matrix II of eq. (|36|). 

However, one must keep in mind that all the method relies on the scalar 
product (^|) where ir(x) cannot vanish for any x. Therefore we must work at T > 
where n(x) > and then take the limit T — ► in the gap relation (|l^). We recall that 
the coefficients of n arc the conditional probabilities at equilibrium that Uk contains 
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a balls given that Ui contains b balls. When T — > 0, in regard to eq. (|54|), the system 
condenses on its energy minimum inside The energy is minimized when all the 
n — b remaining balls are in the same urn. This minimum is m — 1 times degenerate. 
Therefore a — n—b with probability 1 / (m — 1 ) and a = with probability 1 — 1/ (m— 1) . 

Therefore the limiting matrix n is a m x m block-matrix like ( [48| ) where each 
block is now a (n + 1) x (n + 1) matrix and 



B 



\ 



m-2 
m—1 

o 



m-2 
m — 1 
1 

m — 1 



1 \ 









(56) 



The matrix B can be diagonalized and its eigenvalues are 1 (non-degenerate), 
l/(m— 1) and — l/(m — 1). The corresponding eigenvectors are (m — 1, 0, . . . , 0, 1); 
the vectors (1,0, ... , 0, 1), (0, 1, 0, . . . ,0, 1, 0), and so forth; the vectors (1,0,..., 0, —1), 
(0, 1, 0, . . . , 0, —1, 0), and so forth. The exact number of vectors of the two last species 
depends on the parity of n: if n is even, one should add the vector (0, . . . , 0, 1, 0, . . . , 0), 
where the one is at the middle of the vector. The corresponding eigenvalue is 1/ (m — 1). 
Then the same kind of calculation as in the previous example |3.3| leads to the conclusion 
that the multi-decomposition is non-degenerate and that v = 2. 

Like in the previous example, this multi-decomposition is regular with redundancy 
r = m — 2 since each transposition of indices k and I is forbidden in exactly two 
decompositions. Since v = 2, theorem |l| reads 



g{P {m) ) > 



inf 

=1,..., 



g(Pk). 



(57) 



As compared to the previous examples, the situation is more complex here, because 
the matrix Pk is not the direct sum of identical matrices. According to the number 
nk = a of balls that are stuck in the urn Uk, the number of particles that are effectively 
involved in the dynamics on the subset r2fe ;a varies. It is equal to n — a. Now we prove 
by induction on m that 

g{P {m) ) > g{P {2) ) (58) 

m — 1 

where g(P^) is the gap of the backgammon model on 2 urns with the same n balls. 

This relation is clear for m — 1. Suppose it is true for m—1. The matrix Pk 
in (|5^) is the direct sum of n + 1 matrices of backgammon models on m — 1 urns 
with n — a balls, a = 0, . . . , n. However, their non-diagonal transition rates must be 
re-scaled by a factor (n — a) jn to get Pk , so as to take into account that the number 
of balls differs in both matrices; and by a factor (m — 2)/(m — 1) so as to take into 
account the number of arrival urns. Moreover, up to urn permutations, all matrices 
Pk are identical and they have the same gaps. Hence 

n — a m — 2 



g{P (m) ) > 



m 



inf 

=0,...,n-l 

inf 

a<n 



m — 1 



g(Pn 



(m-l)s 



(59) 



g{PXl a ) 
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where the subscript n — a means that the system contains n — a balls|. aince 
g(P„2 a ) — l/2 n ~ a ~ 1 , the previous infimum is reached when a = 0. Hence ([38]) is 
verified. 

This relation is of crucial importance in the understanding of the origin of glassy 
dynamics: all the slow character (the exponential decay of the gap with n) originates 
from the m = 2 case; adding additional boxes only divides the gap by a polynomial 
factor (see the discussion at the end of this subsection, and Ref. 

We have tested numerically this gap relation on small systems where the transition 
matrix can easily be fully diagonalized numerically, namely for all m and n such that 
m + n < 14 as well as all n < 20 for m < 4. In all cases, the previous inequality turns 
out to be an equality. We are led to the following: 

Conjecture 1 The inequality (pqj is in fact an equality: 



g(P (m) ) = -^-r g(P {2) )- (60) 

m — 1 

We shall return to this conjecture below. 

Usually the time unit in discrete time Markov processes is chosen to depend on 
the system size in order to be more physical. In the present case, we define the time 
unit so that the time increment is St = l/(m — 1) at each step of the Markov chain, 
as in Ref. |^|, Then on average, one tries one move per urn and per time unit and 
one expects that typical times will not depend on the system size at the large size 
limit; we will see below that it is precisely what happens. Spectral gaps must also be 
re-scaled to take this point into effect: g'(P) = (m — 1) g{P)- Relation ( |5S| ) becomes 

g\P (m) ) > g'(P (2) ). (61) 

In this section, we have focused on spectral gaps, whereas other references || |l2], 
|l3| , EM |l9f consider absorption or ergodic times r'(m, n) instead. Let us make the 
assumption that r'(m,n) ~ l/g'(p( m >). In the case m = 2, we have seen that 
g(P {2) ) ~ l/2 n " 1 , and it has been calculated that r' (2, n) ~ 2"" 1 §,|l|,|l|. Therefore 
the previous assumption is exact in this case. Under this assumption, relation (^Tj) 
becomes r'(2,n) > r'(m, n). 

Since the 2-urn dynamics is naturally faster than the m-urn one (see [fL2|), we 
also naturally assume r'(2,n) < r'(m,n). As a conclusion, 

r (2,n) ~T'(m,n). (62) 

We recover the conclusion of Lipowski fll2| , which was based on numerical simulations. 
Under the previous assumption, this relation has the same meaning as conjecture [l]. 

4.3. Monkey model 

Now we sketch the proof that by contrast, the monkey model is rapid. This property 
is already present when m = 2. Indeed, in that case, the number of balls in each urn 
increases or decreases by one unit with equal probability at each time step. Therefore 

t We have excluded the value a = n in the relation. Indeed if Uk contains n hall s, the subset f2fc ;ri 
has only one trivial state and the Markov chain on Ofe ; o is degenerate. In eq. (|20|), this subset does 
not provide any eigenvector fj and cannot contribute to fii . Therefore it does not appear in the 
infimum of (|lq). 
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this model is equivalent to a random walker with n + 1 states with two absorbing 
states at its extremities. Its spectral gap is 

g(P^) = l- C0S (n/n)c^-^. (63) 

Taking care of transition rates, which are now equal to l/[m(m — 1)], all the previous 
proof can be transposed in the present case. In particular, in regard to eq. (|53|), n 
and v are unchanged. One finally gets: 

g(P {m) ) > —^—g(P^) * — ^—r \. (64) 

m[m — 1) m(m — 1) n z 

The Markov chain is rapid. 



5. Contingency table problem 

Finally, we present partial results on a difficult problem from probability theory: the 
"Contingency table problem" . In this case, we will not be able to calculate exactly the 
norm v because the matrices fl or II are not as regular as in the previous examples. 
However v can be computed numerically for thousands of different examples because 
these matrices are sufficiently small. This leads us to the conjecture that v < 2. This 
upper bound is sufficient to prove that the Markov chains und er consideration are 
rapid (see equations (f73|) to (f76j)). Furthermore, as in section 53, the same argument 



can be applied to two different Markov chains commonly associated with this problem, 
because v is an equilibrium quantity. 

Contingency tables are mxn matrices of non-negative integers with given positive 
row and column sums, which arise in statistics as two-way tables to store data from 
sample collection (see |7[ |j| for instance). To test correlations between row and 
column entries, x 2 -tests are applied to such tables, but they require to sample (almost) 
uniformly from the set of contingency tables with given row and column sums. Since 
no systematic way is known to perform this sampling, Monte Carlo Markov chains 
have been designed to explore randomly these configuration sets. We study two such 
chains. 

Consider a positive integer £ and two sequences of positive integers: the row 
sums r*fc, k = 1, . . . , m and the column sums c p , p — 1, . . . , n such that 

m n 

Y rk = Y c p = s - ( 65 ) 
fc=i P =i 

Denote by £l m ,n the set of all the mxn contingency tables (zk P ) such that for all k 
and all p 

n m 

Zk P = r k and ^ z kp = c p . (66) 

p=l k=l 

An example of contingency table is displayed in figure 
5.1. Diaconis and Saloff-Coste's chain M. us '■ 

These authors define a discrete time reversible chain Mds 011 ^m,n as follows: at 
time t the chain is in the state x = (zij). With probability 1/2, do nothing (so that 
P(x, x) > 1/2); with probability 1/2, pick up uniformly two different rows k < I as well 
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as two different columns p < q at random. Choose a number b G { — 1,1} uniformly at 
random. Define x' = (Zy) by z[ - — for all i and j except that: 

z k P = z k P + b; (67) 

Z kq = Z kq ~ b; 

Z 'ip = z ip — b; 

z 'lq = 2/g + 

The alternation of signs ensures that the row and column sums are conserved (see 
figure ^). If [z[j) is a contingency table (that is to say if all coefficients are 
non- negative), then accept this elementary move; else reject it. This chain Mds 
is reversible and converges towards the uniform stationary distribution |po| . The 
non-zero non-diagonal coefficients of its transition matrix P^' n ^ are all equal to 
2/[m(m- l)n(n- 1)]. 
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Figure 4. An exam ple of contingency table (left) and of elementary move as 
. The row and column sums are listed on the right and 
lines and two columns chosen at random are emphasized, 



denned in subsection 5.1 
below the table. The two 
as well as the "boxes" z%j that decrease or increase by one unit. This elementary 
move does not affect the row and column sums. 



The time t to reach equilibrium has been calculated in the particular case of 
two-row contingency tables |21j and is polynomial in the number of columns n and 
in the table sum E. To extend this result to the general case seems to be a serious 
challenge 

We build a multi-decomposition by fixing the values of a single row k (or 
symmetrically of a single column p) . Let Ak denote the set of all the possible states 
in Q of the whole row k. For any such state a € Ak, ^k-a is the set of all contingency 
tables in Q m ,n the fc-th row of which is identical to a. Then all the subsets f2fc ;a for 
k = 1, . . . , m and a 6 Ak form a suitable multi-decomposition of Ai ds- Indeed, when 
a row is held fixed, the resulting chain on flk- a is a chain of type M.ds on a set of 
smaller contingency tables of dimensions (m — 1) X n and of sum £' = £ — r^. (In a 
similar way, should the p-th. column be held fixed, the resulting chain would be a chain 
of type A4ds on a set of smaller contingency tables of dimensions m x (n — 1) and 
of sum E" = E — c p .) Moreover, this multi-decomposition is regular of redundancy 
t = vn — 2 because each move of row indices k and I is forbidden in exactly two 
decompositions, (f2fc ;a ) and (fli-b)- 

Unfortunately, the calculation of the norm v of the multi-projection II is much 
more complicated in this case than in the previous examples because the matrices 
II or IT lack regularity. However, thousands of numerical simulations lead us to the 
following conjecture: 
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Conjecture 2 In the above "Contingency table problem", whatever the table 
dimensions to and n, whatever the table sum E, whatever the row and column sums r k 
andc p , the previous multi-decomposition is non- degenerate and the following inequality 
holds: v < 2. 

Note that we already know that conjecture is satisfied in a particular case: up 



to transition rates, the Markov chain on the symmetric group of subsection 3.3 is 
of .M.Ds-type when k = p and the ru and c p are all set to 1. Then Sl m ,m is the 
group of permutation matrices of size to, which is isomorphic to S m . We found 
v = to/ (to — 1) < 2. 

If this conjecture is true, theorem |l| reads 

g(P { ™> n) )>m{g(P k ). (68) 



Now Pj~ is the direct sum (see eq. (|14|)) of matrices of Markov chains of type Mds 
on (m — 1) X n contingency tables, except that their transition rates are equal to 
2/[to(to — l)n(n — 1)] instead of 2/[(m — 1)(to — 2)n(n — 1)]. Thus 

9(P ( D m s n) ) > ^ Mg{Pi m D - s hn) ). (69) 

TO k 

We keep track of the index k in this relation because the spectral gap of the restriction 
of P to the decomposition (Qk- a ) depends on k in general since does. In a similar 
way, if one chooses to fix columns indexed by p, 

9(P { n n s n) ) > ^ MgiP^). (70) 

Then we understand that inductively, by "removing" successively rows and columns, 
one will end up with a Markov chain of type M.ds on 2 x 2 contingency tables and 
that the following inequality will hold: 

(p (m,n) ) > _2 2 (2,2) 

yy DS ' ~ m{m- 1) n(n- 1) *K-<*m-a yy i k ^}' y ' 

PI < . . . <p n — 2 

(2 2) 

In this inequality, P^ k \.p.y is the transition matrix of the Markov chain of type Mds 
on 2 x 2 contingency tables obtained by fixing successively to — 2 rows and n — 2 
columns of the initial to x n tables^. 

Suppose these 2x2 contingency tables have row sums r and r' and column sums 
c and c' such that r + r' = c + c! = Z. These tables are entirely characterized by one 
of their coefficients, say z\\. This coefficients varies by ±1 at each step of the Markov 
process. Therefore this chain is a one-dimensional random walker with non-diagonal 
transition rates equal to 1/4 and "von Neumann" boundary conditions; Z\\ has a 
minimal value z m i n and a maximal one z max which depend on the precise values of c, 
c', r and r' . Nevertheless, one is sure that z m i n > and z max < Z — 1<E — 1, even 
if these bounds can be very loose. Therefore in all cases 

^X^A^-^^h (72) 

f It is possible that, during the previous process, after fixing a certain amount of rows and columns, 
one obtains a problem on m' X n' tables with one (or even more) row (or column) sum equal to 0. 
In this case, it is useless to apply the multi-decomposition technique, because all the entries of that 
row (or column ) ar e necessarily equal to 0. By fixing this column, one obtains only one subset Ofe ;a 
and inequality (pq) is trivial in this case. 
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and inequality (17 lh reads 



,(m,n)s 1 2 2 < 7r - 



5(^5 ') > o ^ TT (! - cos (^))- (73) 



2 m(m — 1) n(n — 1) £ 
5.U. -Dyer and Greenhill's chain Jjl|/ M.bg 



As a conclusion, I/5(-F > £)™' n '') is quadratic in to, n and S, provided conjecture @ is true. 



As compared to m this type of chain Ai dg, once the two rows and the 

two columns have been picked up at random, the new table is chosen uniformly 
among all the tables accessible for any value of 6 6 Z. In practice, b can take all 
the integral values between a minimum value & m i n and a maximum one & max , which 
depend on the table, the two chosen rows and the chosen columns (see the discussion 
about zn in the previous subsection). Apart from this (important) difference, all 
the previous development can be applied to the present chain. In particular, the 
multi-decomposition is adapted to M.dg- Once again, one notices that the multi- 
decomposition technique does not depend on the details of the chain, provided the 
stationary distribution w and the subsets fifc ;a are the same. 

However, after removing to — 2 rows and n — 2 columns, the "ultimate" Markov 
chain of type Mdg on 2 x 2 tables will differ from the A^^s-type one. Indeed, at each 
step, the new 2x2 table is chosen uniformly at random among all the N accessible 

(2 2) 

ones. Therefore all the coefficients of the transition matrix P D q associated with this 
chain are equal to 1/N and its gap is 1 whatever N, and inequality ((FT]) becomes 

(m,n)^ ^ 2 2 

to(to — 1) n(n — 1) 



9«G l) ) > T\ Ti~ TT - (74) 



Hence 1/ ' g{P^f^) is also quadratic in m and n but not in S, provided conjecture ^| 
is true. Note that it was already proven rigorously that this chain is polynomial in n 
when to is held fixed g|[23). 



5.3. A remark about mixing times T var {e): 

We use relation ( p"3] ) to calculate an upper bound of mixing times T var (e) for both 
Markov chains. Since it is uniform, l/ln(7r(x)) is constant and equal to lnA^^, 
where N m , n denotes the cardinality of fl m ,n. Now each integral coefficient Zk P of the 
table is non-negative and is bounded above by the table sum S. Hence A TOi „ < S 
and In N m n < mn In E. As a consequence, 



inn 



„(e) < — to(to — l)n(n — 1)- 



InE + In | - 



(75) 



2 v ' v y I-cos(7r/£) 

~ -4 m 3 n 3 S 2 In S + TO 2 n 2 £ 2 In f - 

for the Diaconis and Sallof-Coste's Markov chain, and 

r' var (s) < j m 2 (m- l)n 2 (n-l) In £+j m(m- l)n(n- 1 ) In \ (76) 

for the Dyer and Greenhill's one. As discussed in references [^2[ [23], the first chain 
M ds is polynomial in to, n and E, whereas the second one Mdg is polynomial in m 
and n but only logarithmic in S (if conjecture ^ is true). The present method provides 
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a good understanding of this phenomenon by naturally relating the dynamics on to x n 
tables to the dynamics on 2 x 2 ones. 

To close this section, let us mention the existence of generalizations in higher 
dimensions of the previous Markov chains on contingency tables ]24[ . It would be 
interesting to test whether this technique can be applied to them. 

6. Conclusion 

The multi-decomposition technique is a generalization of the "Chain decomposition 
method" of D. Randall which can be more easy to implement in certain cases. Instead 
of a single decomposition, the method uses several intricate ones. It relates the spectral 
gap of a complex Markov chain to the spectral gaps of Markov chain on smaller subsets 
of the configuration space. In particular, a key quantity intervenes in this relation, 
which is calculated at equilibrium, whereas the spectral gap is a dynamical quantity. 

We have illustrated the potentialities of the multi-decomposition technique on 
several examples. The first examples (random walkers and "card shuffling" ) are 
elementary, in that sense that their spectral gaps can be calculated by other means. 
However, they are a good illustration of the method efficiency. We have also 
exemplified in these examples that the same multi-decomposition can be applied to 
different dynamics on the same system. 

The last examples are more interesting since they deal with complex Markov 
chains of theoretical interest. The "Backgammon model" and the "Monkey model" 
belong to the wider class of "Urn models" and are related to the physics of glassy 
systems. We have recovered in a rigorous way that the former is slow whereas the 
latter is rapid. This fundamental difference is only due to the fact that particles are 
distinguishable in the first model and indistinguishable in the second one. These two 
examples have also been the occasion to introduce temperature in the method. The 
"Contingency table problem" is a notoriously difficult problem from probability theory. 
We have not been able to solve completely this problem. However, the computational 
advantage of the method is illuminating in this case: the matrix n is much smaller than 
the original matrix P and can be numerically diagonalized for much larger systems. 
These numerical calculations support a conjecture, the veracity of which would imply 
that the Markov chains under interest are rapid. In this case also, the same multi- 
decomposition can be applied to two different dynamics. 

In addition, we have remarked that for the four first examples examined in this 
paper, the gap relation ( |is| ) of theorem |l| turned out to be an equality, since we 
obtained the exact gaps at the end of the calculations. It is legitimate to wonder 
whether it is always true. Indeed, even if the present version of the theorem is 
sufficient to establish that some Markov chains are rapidly mixing, this would be 
a stronger result than theorem [l]. To answer this question, we mention that we know 
an example on a 20-state Markov chain where ( |l8| ) is only an inequality and not an 
equality. Hence the question becomes: is it possible to establish a criterion to decide 
whether relation ( |l8| ) is an equality or not? It is difficult to tackle this question in 
the frame of our calculations, because inequalities appear in the demonstration of 
theorem [l| 
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